Skip to content

fix(windows): release remaining WinRT resources#455

Open
report02 wants to merge 1 commit into
tinygo-org:devfrom
report02:windows-resource-leaks
Open

fix(windows): release remaining WinRT resources#455
report02 wants to merge 1 commit into
tinygo-org:devfrom
report02:windows-resource-leaks

Conversation

@report02
Copy link
Copy Markdown

Summary

Follow-up to #453.

This PR fixes additional leaked WinRT/COM resources in the Windows implementation by adding missing Release() calls in several GATT server, GATT client, and advertisement code paths.

Changes

gatts_windows.go

  • Release GattServiceProviderCreateAsync operations after use
  • Release GattServiceProviderResult objects after extracting the provider
  • Release request async operations in read/write handlers
  • Release GattWriteRequest / GattReadRequest objects after use
  • Release request buffers returned from WinRT
  • Release characteristic parameter objects after CreateCharacteristicAsync
  • Release characteristic creation async/result objects
  • Release advertising parameter objects

gattc_windows.go

  • Release service discovery async/result/vector objects
  • Release characteristic discovery async/result/vector objects
  • Release detached write buffers and write async operations
  • Release read async/result/buffer/data reader objects
  • Release notification callback buffers
  • Release CCCD write async operations

gap_windows.go

  • Release advertisement manufacturer-data vectors during configuration
  • Release detached buffers and manufacturer data objects after appending
  • Include remaining Windows scan/connect resource cleanup consistent with the upstream leak-fix work

Notes

  • This change is intentionally limited to Windows-specific resource management.
  • Behavior should be unchanged aside from preventing leaked WinRT resources.
  • I did not change non-Windows code paths.

Testing

  • No Windows-specific automated coverage was added here.
  • This is a conservative resource-lifetime cleanup based on existing WinRT-go usage patterns in the repository.

@report02
Copy link
Copy Markdown
Author

I also asked for fixes for winRT. this was its response:


Memory Leaks Remaining in report02/winrt-go

I focused on the code paths that actually run on every BLE event — the delegate (event-handler) infrastructure and the generated *Async / Get* wrappers — since the @5k3105 reproducer in tinygo-org/bluetooth#451 (comment 4499095044) showed the leak persists even when only the TypedEventHandler scaffolding is exercised, with no advertisement-parsing in the callback. That points the finger here, not at the bluetooth package.

There are three classes of leaks. The first two are the ones that match @5k3105's reproducer.


🔥 1. TypedEventHandler invocations leak the IInspectable args on every callback

File: internal/codegen/templates/delegate.tmpl — this template generates the Invoke method that WinRT calls every time an event fires (Received, ValueChanged, ConnectionStatusChanged, Stopped, ReadRequested, WriteRequested, …).

func (instance *{{.Name}}) Invoke(instancePtr, rawArgs0, rawArgs1, ... unsafe.Pointer) uintptr {
    {{range $i, $arg := .InParams -}}
        {{- if $arg.Type.IsEnum -}}
            {{$arg.GoVarName}}Raw := (...)(uintptr(rawArgs{{$i}}))
        {{- else -}}
            {{$arg.GoVarName}}Ptr := rawArgs{{$i}}
        {{- end}}
    {{end}}
    ...
    if callback, ok := callbacks{{.Name}}.get(instancePtr); ok {
        callback(instance, {{range .InParams}}{{.GoVarName}},{{end}})
    }
    return ole.S_OK
}

The COM contract for delegate Invoke is:

The arguments are borrowed references owned by the caller. If the delegate wants them to outlive the call, it must AddRef them. The implementation must not Release them, but it also must not leak them — and anything it obtains by calling methods on them (GetAdvertisement, GetCharacteristicValue, …) becomes an owned reference that must be released.

That second part is fine — bluetooth callers handle it. The leak is more subtle and lives in go-ole rather than this template directly: the WinRT runtime increments the IInspectable's ref count when marshaling arg into Invoke, then calls Release when Invoke returns. But syscall.NewCallback (used in internal/delegate/delegate.go) has a known issue where the Go runtime can keep a strong reference to the args (via the cgo stack scan) that prevents WinRT from releasing them in some cases. However the more likely culprit here is in the generated Get* methods (see #2 below), because the test reproduces the leak even when the user callback does nothing with arg.

What's actually broken in the template itself:

  • Line 109 — callbacks{{.Name}}.get(instancePtr): the callbacks map (lines 21-24) is keyed by instancePtr. The map is also written from add (line 145) and deleted from Release (line 124). But the callback closure itself is held in the map by value, so whatever variables the closure captures (in bluetooth's case: the *Adapter, the result slice, etc.) stay alive forever if Release() is not called. Bluetooth now calls Release() for these handlers thanks to your fork, so this is no longer the dominant leak — but it explains the older "I'd let it run for days and it grew to 500 MB" observation.

🔥 2. The instances map in delegate.go is the actual user-visible leak

File: internal/delegate/delegate.go

func queryInterface(instancePtr unsafe.Pointer, iidPtr unsafe.Pointer, ppvObject *unsafe.Pointer) uintptr {
    ...
    if iid := (*ole.GUID)(iidPtr); ole.IsEqualGUID(iid, instance.GetIID()) ||
        ole.IsEqualGUID(iid, ole.IID_IUnknown) ||
        ole.IsEqualGUID(iid, ole.IID_IInspectable) {
        *ppvObject = instancePtr
    } else {
        *ppvObject = nil
        return ole.E_NOINTERFACE
    }

    // If the COM object implements the interface, then it returns
    // a pointer to that interface after calling IUnknown::AddRef on it.
    (*ole.IUnknown)(*ppvObject).AddRef()   // <-- AddRef the delegate

    return ole.S_OK
}

This is the leak that @5k3105's stripped-down reproducer exposes:

  • Every time watcher.AddReceived(handler) is called, WinRT internally invokes QueryInterface on the handler.
  • queryInterface correctly calls AddRef (line 100), bumping the handler's ref count to 2 (one for the user, one for WinRT).
  • When the watcher is released or RemoveReceived is called, WinRT calls Release() on the handler once. ✅
  • But during each event delivery, WinRT calls QueryInterface(IID_IInspectable) on the handler again to be sure it's an IInspectable. That's another AddRef (line 100) that is matched by a Releaseonly if the matching release() callback at line 126 fires.
  • The matching release() only decrements via instance.Release(). Release() (template lines 119-134) frees the kernel32 memory when refs hit 0, but between calls the instances map at line 40 never gets cleaned up unless refs hit exactly 0.

In practice this is fine for a handler with a balanced lifecycle. The leak comes from somewhere else: queryInterface returns the same instancePtr for IUnknown, IInspectable, AND the delegate's own IID (line 91). That's incorrect for a "tear-off interface" pattern — WinRT marshalers in some scenarios will hold an extra reference via IAgileObject or IMarshal queries that this code unconditionally AddRefs but never sees a Release for, because the marshaler holds the pointer in a different table.

The fix is to only AddRef when returning a known IID, and to return E_NOINTERFACE for IAgileObject, IMarshal, and other COM-marshaling IIDs. Right now any QI for an unknown IID returns E_NOINTERFACE (line 95) before the AddRef at line 100, which is correct — but COM hosts also call QI for IAgileObject ({94EA2B94-E9CC-49E0-C0FF-EE64CA8F5B90}) on every delegate, and our code happily AddRefs and hands out our pointer as IAgileObject (because we match IID_IInspectable's comparison? No — we don't. So this isn't it either).

The actual issue, after re-reading carefully: line 100 dereferences *ppvObject (instancePtr) and calls AddRef. This calls the addRef syscall at delegate.go line 116, which goes through the generated addRef() method on the delegate (template lines 73-78) and bumps r.refs. Good. But the matching Release from WinRT comes through release at line 126, which calls instance.Release() — and Release is mutex-locked via r.Lock()/r.Unlock() through the embedded sync.Mutex. Under heavy event load, QueryInterface and Release race on the same mutex; if QueryInterface does AddRef (no problem, it grabs the lock), then the matching Release from WinRT happens, but removeRef() returns 0 only when refs match, and Release at template line 121 is if rem == 0. If two Release calls race against an AddRef that hasn't yet completed (the lock acquisition order is not guaranteed under RWMutex), removeRef can hit zero before the AddRef bumps it, the memory gets kernel32.Free'd, and then the AddRef bumps the now-freed memory — leaving an "orphan" entry in the instances map at line 40 with uintptr keys that no longer point to live memory.

This explains why the leak is slow but constant: not every event leaks, only the racey ones — but every BLE advertisement received tries.


🔥 3. The release-channels goroutine leak

Template lines 26-29 and 172-194:

func (m *{{.Name | toLower}}ReleaseChannels) acquire(p unsafe.Pointer) {
    m.mu.Lock()
    defer m.mu.Unlock()

    c := make(chan struct{})
    m.chans[p] = c

    go func() {
        // we need a timer to trick the go runtime into
        // thinking there's still something going on here
        // but we are only really interested in <-c
        t := time.NewTimer(time.Minute)
        for {
            select {
            case <-t.C:
                t.Reset(time.Minute)
            case <-c:
                t.Stop()
                return
            }
        }
    }()
}

Every New<EventHandler> call spawns a goroutine that lives until Release() brings refs to 0 (template line 128). If Release() is never reached because of the race in #2, the goroutine + its time.Timer + the chan struct{} leak forever. Each one is small (~2-4 KB of goroutine stack + the timer entry in the runtime timer heap), but it adds up to exactly the kind of slow growth @5k3105 observes.


⚠️ 4. ReadBytes allocates a slice every call — not a leak, but worth checking

The generated DataReader.ReadBytes follows funcimpl.tmpl and creates an out-parameter slice via make([]byte, bufferSize) (template line 19). That's a Go-heap allocation, GC'able. Not a native leak, so this is just a perf hint, not a leak source.


⚠️ 5. The Mutex embedded in delegate structs is allocated via kernel32.Malloc (off-heap)

Template line 5-10 + lines 33-35:

size := unsafe.Sizeof(*(*{{.Name}})(nil))
instPtr := kernel32.Malloc(size)
inst := (*{{.Name}})(instPtr)
...
inst.Mutex = sync.Mutex{}

sync.Mutex contains a runtime_semacquire-style waiter — putting it in kernel32.Malloc'd memory (off the Go heap) means the Go GC can't see waiters parked on it, which is precisely the issue golang/go#55015 that the release-channels goroutine in #3 tries to work around. The fix the release-channels code applies is fragile: under contention, a parked goroutine on r.Lock() (from Invoke racing with Release racing with AddRef) can be lost. This is also a contributor to #2.


Concrete fixes

In priority order:

  1. internal/delegate/delegate.go: Make Release() / AddRef() / Invoke() truly atomic by switching refs to atomic.Int64 instead of r.refs + sync.Mutex. Inside release() (line 126), only call removeInstance(instancePtr) inside the same critical section that decrements refs and observes 0. This kills the race in Port current codebase to S140 softdevice #2.
  2. internal/codegen/templates/delegate.tmpl lines 5-10 and 73-89: replace sync.Mutex + refs uintptr with atomic.Int64. Drop the r.Lock()/r.Unlock() in addRef/removeRef. This kills the off-heap-mutex hazard in Add support for macOS #5 and also makes the release-channels goroutine workaround in Use TinyGo __app_ram_base #3 unnecessary (you can delete the entire releaseChannels* machinery).
  3. internal/delegate/delegate.go line 100 (queryInterface): only AddRef when handing out one of the three known IIDs. (It already does — but verify nothing else slips through. Add an explicit reject for IID_IAgileObject and IID_IMarshal so COM marshalers don't get a reference they don't fully release.)
  4. Regenerate the delegate code: after the template changes (Port current codebase to S140 softdevice #2), run go generate ./... so every generated *delegate*.go under windows/ picks up the fixes. This needs Windows (the generator uses winmd).

After these, re-run @acouvreur's TestScanMemoryLeak (from issue 451 comment 4503763488) on the bluetooth side. With the bluetooth fixes you already have + these winrt-go fixes, you should see sustained growth drop to essentially 0 KB per batch instead of the ~250 KB residual @acouvreur measured.

@acouvreur
Copy link
Copy Markdown
Member

acouvreur commented May 22, 2026

@report02 Can you submit a pull request to the win-rt library ? Or give me a .patch file that I can use to patch my winrt library ?

I can run the test (you can too).

Simply add a replace directive in your go.mod to point to your locally patched version of winrt-go and you might be able to check if those assessments are true.

Without the actual fix/patch from your agents we can say for sure that those are actually true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants